{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fJEJeV_xiy_Z"
      },
      "source": [
        "# Running database reconstruction attacks on the Iris dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TV3AYZZsiy_d"
      },
      "source": [
        "In this tutorial we will show how to run a database reconstruction attack on the Iris dataset and evaluate its effectiveness against models trained non-privately (i.e., naively with scikit-learn) and models trained with differential privacy guarantees."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qVrADCiRiy_d"
      },
      "source": [
        "## Preliminaries"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vt59Q5mwiy_e"
      },
      "source": [
        "The database reconstruction attack takes a trained machine learning model `model`, which has been trained by a training dataset of `n` examples.  Then, using `n-1` examples of the training dataset (i.e., with the target row removed), we seek to reconstruct the `n`th example of the dataset by using `model`.\n",
        "\n",
        "In this example, we train a Gaussian Naive Bayes classifier (`model`) with the training dataset, then remove a single row from that dataset, and seek to reconstruct that row using `model`. For typical examples, this attack is successful up to machine precision.\n",
        "\n",
        "We then show that launching the same attack on a ML model trained with differential privacy guarantees provides protection for the training dataset, and prevents learning the target row with precision."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YpfK4Nusiy_e"
      },
      "source": [
        "## Example usage"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pOVHT-5_iy_f"
      },
      "source": [
        "## Load data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ob4ob-t3iy_f"
      },
      "source": [
        "First, we load the data of interest and split into train/test subsets."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "pip install scikit-learn==1.1.0 diffprivlib"
      ],
      "metadata": {
        "id": "gby2EgKxi0V6"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "-shsBUVtiy_f"
      },
      "outputs": [],
      "source": [
        "from sklearn import datasets\n",
        "from sklearn.model_selection import train_test_split\n",
        "import numpy as np\n",
        "\n",
        "dataset = datasets.load_iris()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "o45qkmULiy_h"
      },
      "outputs": [],
      "source": [
        "x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Znv0wYdjiy_h"
      },
      "source": [
        "## Train model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oP2zzF33iy_h"
      },
      "source": [
        "We can now train a Gaussian naive Bayes classifier using the full training dataset. This is the model that will be used to attack the training dataset later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "EzNvuHlUiy_h"
      },
      "outputs": [],
      "source": [
        "import sklearn.naive_bayes as naive_bayes\n",
        "from art.estimators.classification.scikitlearn import ScikitlearnGaussianNB\n",
        "\n",
        "model1 = naive_bayes.GaussianNB().fit(x_train, y_train)\n",
        "non_private_art = ScikitlearnGaussianNB(model1)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "bbYT5V5Giy_h",
        "outputId": "f7caad45-1c61-4aa2-9ad0-bde5521abb60"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Model accuracy (on the test dataset): 0.9\n"
          ]
        }
      ],
      "source": [
        "print(\"Model accuracy (on the test dataset): {}\".format(model1.score(x_test, y_test)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qLRPlGyWiy_i"
      },
      "source": [
        "## Launch and evaluate attack"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oVTAdiU9iy_i"
      },
      "source": [
        "We now select a row from the training dataset that we will remove. This is the **target row** which the attack will seek to reconstruct. The attacker will have access to `x_public` and `y_public`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "RW77vt0Oiy_i"
      },
      "outputs": [],
      "source": [
        "target_row = int(np.random.random() * x_train.shape[0])\n",
        "\n",
        "x_public = np.delete(x_train, target_row, axis=0)\n",
        "y_public = np.delete(y_train, target_row, axis=0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e9uRQZDGiy_i"
      },
      "source": [
        "We can now launch the attack, and seek to infer the value of the target row. This is typically completed in less than a second."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "nS7k0mT2iy_i"
      },
      "outputs": [],
      "source": [
        "from art.attacks.inference.reconstruction import DatabaseReconstruction\n",
        "\n",
        "dbrecon = DatabaseReconstruction(non_private_art)\n",
        "\n",
        "x, y = dbrecon.reconstruct(x_public, y_public)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Uu3h3tdaiy_i"
      },
      "source": [
        "We can evaluate the accuracy of the attack using root-mean-square error (RMSE), showing a high level of accuracy in the inferred value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iLI7OtBriy_i",
        "outputId": "5813f0c9-299c-4c84-9f6d-e26186e2798f"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Inference RMSE: 1.772445594194786e-08\n"
          ]
        }
      ],
      "source": [
        "print(\"Inference RMSE: {}\".format(\n",
        "    np.sqrt(((x_train[target_row] - x) ** 2).sum() / x_train.shape[1])))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yfFbmUMpiy_j"
      },
      "source": [
        "We can confirm that the attack also inferred the correct label `y`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FvLd1ZHriy_j",
        "outputId": "f872a5b3-b060-4bd0-e116-9251b53a3c96"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ],
      "source": [
        "np.argmax(y) == y_train[target_row]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kWDL5cLyiy_j"
      },
      "source": [
        "# Attacking a model trained with differential privacy"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xHjaXNYUiy_j"
      },
      "source": [
        "We can mitigate against this attack by training the public ML model with differential privacy.  We will use [diffprivlib](https://github.com/Trusted-AI/differential-privacy-library) to train a differentially private Gaussian naive Bayes classifier. We can mitigate against any loss in accuracy of the model by choosing an `epsilon` value appropriate to our needs."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_fRyH1bQiy_j"
      },
      "source": [
        "## Train the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "C1SXrq8liy_j",
        "outputId": "97e391e7-2491-491b-c5bb-26db511dab86"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.9333333333333333"
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ],
      "source": [
        "from diffprivlib import models\n",
        "\n",
        "model2 = models.GaussianNB(bounds=([4.3, 2.0, 1.1, 0.1], [7.9, 4.4, 6.9, 2.5]), epsilon=3).fit(x_train, y_train)\n",
        "private_art = ScikitlearnGaussianNB(model2)\n",
        "\n",
        "model2.score(x_test, y_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OGvTeCtDiy_j"
      },
      "source": [
        "## Launch and evaluate attack"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_UNqtk1ziy_j"
      },
      "source": [
        "We then launch the same attack as before. In this case, the attack may take a number of seconds to return a result."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "id": "JxpObtA9iy_j"
      },
      "outputs": [],
      "source": [
        "dbrecon = DatabaseReconstruction(private_art)\n",
        "\n",
        "x_dp, y_dp = dbrecon.reconstruct(x_public, y_public)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z0uk8uu7iy_j"
      },
      "source": [
        "In this case, the RMSE shows our attack has not been as successful"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SX-iZBtZiy_j",
        "outputId": "de6267b0-8597-4cac-9dd3-6f82dae53e2f"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Inference RMSE (with differential privacy): 1.5083110802156137\n"
          ]
        }
      ],
      "source": [
        "print(\"Inference RMSE (with differential privacy): {}\".format(\n",
        "    np.sqrt(((x_train[target_row] - x_dp) ** 2).sum() / x_train.shape[1])))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zg8w22Jsiy_k"
      },
      "source": [
        "This is confirmed by inspecting the inferred value and the true value."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "scrolled": false,
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "SULTN1Hdiy_k",
        "outputId": "5dc0c7a3-ffe1-4221-ddd9-15cb06119a66"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(array([[7.70000114, 3.00000085, 6.10000039, 2.30000102]]),\n",
              " array([5.5, 2.6, 4.4, 1.2]))"
            ]
          },
          "metadata": {},
          "execution_count": 27
        }
      ],
      "source": [
        "x_dp, x_train[target_row]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CkO1hUVOiy_k"
      },
      "source": [
        "In fact, the attack may not even be able to correctly infer the target label."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "XJdnnSDNiy_k",
        "outputId": "7a11e79b-ccd7-408f-8250-35445cbbb31c"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "(0, 1)"
            ]
          },
          "metadata": {},
          "execution_count": 28
        }
      ],
      "source": [
        "np.argmax(y_dp), y_train[target_row]"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}